259 research outputs found

    Causality and Association: The Statistical and Legal Approaches

    Get PDF
    This paper discusses different needs and approaches to establishing ``causation'' that are relevant in legal cases involving statistical input based on epidemiological (or more generally observational or population-based) information. We distinguish between three versions of ``cause'': the first involves negligence in providing or allowing exposure, the second involves ``cause'' as it is shown through a scientifically proved increased risk of an outcome from the exposure in a population, and the third considers ``cause'' as it might apply to an individual plaintiff based on the first two. The population-oriented ``cause'' is that commonly addressed by statisticians, and we propose a variation on the Bradford Hill approach to testing such causality in an observational framework, and discuss how such a systematic series of tests might be considered in a legal context. We review some current legal approaches to using probabilistic statements, and link these with the scientific methodology as developed here. In particular, we provide an approach both to the idea of individual outcomes being caused on a balance of probabilities, and to the idea of material contribution to such outcomes. Statistical terminology and legal usage of terms such as ``proof on the balance of probabilities'' or ``causation'' can easily become confused, largely due to similar language describing dissimilar concepts; we conclude, however, that a careful analysis can identify and separate those areas in which a legal decision alone is required and those areas in which scientific approaches are useful.Comment: Published in at http://dx.doi.org/10.1214/07-STS234 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Lung cancer and passive smoking: reconciling the biochemical and epidemiological approaches.

    Get PDF
    The accurate determination of exposure to environmental tobacco smoke is notoriously difficult. There have been to date two approaches to determining this exposure in the study of association of passive smoking and lung cancer: the biochemical approach, using cotinine in the main as a marker, and the epidemiological approach. Typically results of the former have yielded much lower relative risk than the latter, and have tended to be ignored in favour of the latter, although there has been considerable debate as to the logical basis for this. We settle this question by showing that, using the epidemiologically based meta-analysis technique of Wald et al. (1986), and misclassification models in the EPA Draft Review (1990), one arrives using all current studies at a result which is virtually identical with the biochemically-based conclusions of Darby and Pike (1988) or Repace and Lowry (1990). The conduct of this meta-analysis itself raises a number of important methodological questions, including the validity of inclusion of studies, the use of estimates adjusted for covariates, and the statistical significance of estimates based on meta-analysis of the epidemiological data. The best estimate of relative risk from spousal smoking is shown to be approximately 1.05-1.10, based on either of these approaches; but it is suggested that considerable extra work is needed to establish whether this is significantly raised

    UAVs and Machine Learning Revolutionising Invasive Grass and Vegetation Surveys in Remote Arid Lands

    Get PDF
    This is the final version of the article. Available from MDPI via the DOI in this record.The monitoring of invasive grasses and vegetation in remote areas is challenging, costly, and on the ground sometimes dangerous. Satellite and manned aircraft surveys can assist but their use may be limited due to the ground sampling resolution or cloud cover. Straightforward and accurate surveillance methods are needed to quantify rates of grass invasion, offer appropriate vegetation tracking reports, and apply optimal control methods. This paper presents a pipeline process to detect and generate a pixel-wise segmentation of invasive grasses, using buffel grass (Cenchrus ciliaris) and spinifex (Triodia sp.) as examples. The process integrates unmanned aerial vehicles (UAVs) also commonly known as drones, high-resolution red, green, blue colour model (RGB) cameras, and a data processing approach based on machine learning algorithms. The methods are illustrated with data acquired in Cape Range National Park, Western Australia (WA), Australia, orthorectified in Agisoft Photoscan Pro, and processed in Python programming language, scikit-learn, and eXtreme Gradient Boosting (XGBoost) libraries. In total, 342,626 samples were extracted from the obtained data set and labelled into six classes. Segmentation results provided an individual detection rate of 97% for buffel grass and 96% for spinifex, with a global multiclass pixel-wise detection rate of 97%. Obtained results were robust against illumination changes, object rotation, occlusion, background cluttering, and floral density variation.This work was funded by the Plant Biosecurity Cooperative Research Centre (PBCRC) 2164 project, the Agriculture Victoria Research and the Queensland University of Technology (QUT). The authors would like to acknowledge Derek Sandow andWA Parks andWildlife Service for the logistic support and permits to access the survey areas at Cape Range National Park. The authors would also like to acknowledge Eduard Puig-Garcia for his contributions in co-planning the experimentation phase. The authors gratefully acknowledge the support of the QUT Research Engineering Facility (REF) Operations Team (Dirk Lessner, Dean Gilligan, Gavin Broadbent and Dmitry Bratanov), who operated the DJI S800 EVO UAV and image sensors, and performed ground referencing. We thank Gavin Broadbent for the design, manufacturing, and tuning of a two-axis gimbal for the camera. We also acknowledge the High-Performance Computing and Research Support Group at QUT, for the computational resources and services used in this work

    Timing anthropogenic stressors to mitigate their impact on marine ecosystem resilience

    Full text link
    © 2017 The Author(s). Better mitigation of anthropogenic stressors on marine ecosystems is urgently needed to address increasing biodiversity losses worldwide. We explore opportunities for stressor mitigation using whole-of-systems modelling of ecological resilience, accounting for complex interactions between stressors, their timing and duration, background environmental conditions and biological processes. We then search for ecological windows, times when stressors minimally impact ecological resilience, defined here as risk, recovery and resistance. We show for 28 globally distributed seagrass meadows that stressor scheduling that exploits ecological windows for dredging campaigns can achieve up to a fourfold reduction in recovery time and 35% reduction in extinction risk. Although the timing and length of windows vary among sites to some degree, global trends indicate favourable windows in autumn and winter. Our results demonstrate that resilience is dynamic with respect to space, time and stressors, varying most strongly with: (i) the life history of the seagrass genus and (ii) the duration and timing of the impacting stress

    Time series prediction via aggregation : an oracle bound including numerical cost

    Full text link
    We address the problem of forecasting a time series meeting the Causal Bernoulli Shift model, using a parametric set of predictors. The aggregation technique provides a predictor with well established and quite satisfying theoretical properties expressed by an oracle inequality for the prediction risk. The numerical computation of the aggregated predictor usually relies on a Markov chain Monte Carlo method whose convergence should be evaluated. In particular, it is crucial to bound the number of simulations needed to achieve a numerical precision of the same order as the prediction risk. In this direction we present a fairly general result which can be seen as an oracle inequality including the numerical cost of the predictor computation. The numerical cost appears by letting the oracle inequality depend on the number of simulations required in the Monte Carlo approximation. Some numerical experiments are then carried out to support our findings

    Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species

    Get PDF
    Appropriate inspection protocols and mitigation strategies are a critical component of effective biosecurity measures, enabling implementation of sound management decisions. Statistical models to analyze biosecurity surveillance data are integral to this decision-making process. Our research focuses on analyzing border interception biosecurity data collected from a Class A Nature Reserve, Barrow Island, in Western Australia and the associated covariates describing both spatial and temporal interception patterns. A clustering analysis approach was adopted using a generalization of the popular k-means algorithm appropriate for mixed-type data. The analysis approach compared the efficiency of clustering using only the numerical data, then subsequently including covariates to the clustering. Based on numerical data only, three clusters gave an acceptable fit and provided information about the underlying data characteristics. Incorporation of covariates into the model suggested four distinct clusters dominated by physical location and type of detection. Clustering increases interpretability of complex models and is useful in data mining to highlight patterns to describe underlying processes in biosecurity and other research areas. Availability of more relevant data would greatly improve the model. Based on outcomes from our research we recommend broader use of cluster models in biosecurity data, with testing of these models on more datasets to validate the model choice and identify important explanatory variables

    Image denoising based on nonlocal Bayesian singular value thresholding and Stein's unbiased risk estimator

    Full text link
    © 1992-2012 IEEE. Singular value thresholding (SVT)- or nuclear norm minimization (NNM)-based nonlocal image denoising methods often rely on the precise estimation of the noise variance. However, most existing methods either assume that the noise variance is known or require an extra step to estimate it. Under the iterative regularization framework, the error in the noise variance estimate propagates and accumulates with each iteration, ultimately degrading the overall denoising performance. In addition, the essence of these methods is still least squares estimation, which can cause a very high mean-squared error (MSE) and is inadequate for handling missing data or outliers. In order to address these deficiencies, we present a hybrid denoising model based on variational Bayesian inference and Stein's unbiased risk estimator (SURE), which consists of two complementary steps. In the first step, the variational Bayesian SVT performs a low-rank approximation of the nonlocal image patch matrix to simultaneously remove the noise and estimate the noise variance. In the second step, we modify the conventional SURE full-rank SVT and its divergence formulas for rank-reduced eigen-triplets to remove the residual artifacts. The proposed hybrid BSSVT method achieves better performance in recovering the true image compared with state-of-the-art methods

    A Bayesian Network-based customer satisfaction model: a tool for management decisions in railway transport

    Get PDF
    We formalise and present an innovative general approach for developing complex system models from survey data by applying Bayesian Networks. The challenges and approaches to converting survey data into usable probability forms are explained and a general approach for integrating expert knowledge (judgements) into Bayesian complex system models is presented. The structural complexities of the Bayesian complex system modelling process, based on various decision contexts, are also explained along with a solution. A novel application of Bayesian complex system models as a management tool for decision making is demonstrated using a railway transport case study. Customer satisfaction, which is a Key Performance Indicator in public transport management, is modelled using data from customer surveys conducted by Queensland Rail, Australia

    Understanding the spatial distribution and hot spots of collared Bornean elephants in a multi-use landscape

    Get PDF
    Abstract: In the Kinabatangan floodplain, Sabah, Malaysian Borneo, oil palm and settlements have reduced and fragmented lowland tropical forests, home to around 200 endangered Bornean elephants (Elephas maximus borneensis). In this region, elephants range within forests, oil palm and community areas. The degree to which elephants are using these areas remains unclear. We used GPS telemetry data from 2010 to 2020 for 14 collared elephants to map their entire known ranges and highly used areas (hot spots) across four land use categories and estimate time spent within these. The use of land use types across elephants varied significantly. Typically, females had strong fidelity to forests, yet many of these forests are threatened with conversion. For the three males, and several females, they heavily used oil palm estates, and this may be due to decreased landscape permeability or foraging opportunities. At the pooled level, the entire range and hot spot extents, constituted 37% and 34% for protected areas, respectively, 8% and 11% for unprotected forests, 53% and 51% for oil palm estates, and 2% for community areas. Protecting all forested habitats and effectively managing areas outside of protected areas is necessary for the long-term survival of this population

    Principles of Experimental Design for Big Data Analysis

    Get PDF
    Big Datasets are endemic, but are often notoriously difficult to analyse because of their size, heterogeneity and quality. The purpose of this paper is to open a discourse on the potential for modern decision theoretic optimal experimental design methods, which by their very nature have traditionally been applied prospectively, to improve the analysis of Big Data through retrospective designed sampling in order to answer particular questions of interest. By appealing to a range of examples, it is suggested that this perspective on Big Data modelling and analysis has the potential for wide generality and advantageous inferential and computational properties. We highlight current hurdles and open research questions surrounding efficient computational optimisation in using retrospective designs, and in part this paper is a call to the optimisation and experimental design communities to work together in the field of Big Data analysis.CCD was supported by an Australian Research Council’s Discovery Early Career Researcher Award funding scheme (DE160100741). CH would like to gratefully acknowledge support from the Medical Research Council (UK), the OxfordMAN Institute, and the EPSRC UK through the i-like Statistics programme grant. CCD, JMM and KM would like to acknowledge support from the Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS). Funding from the Australian Research Council for author KM is gratefully acknowledged
    corecore